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Abstract. We generalize the partial derivative automaton to regular 
expressions with shuffle and study its size in the worst and in the average 
case. The number of states of the partial derivative automata is in the 
worst case at most 2 m , where m is the number of letters in the expression, 
while asymptotically and on average it is no more than (|) m . 


1 Introduction 

The shuffle (or interleaving) operation is closed for regular languages, and ex¬ 
tended regular expressions with shuffle can be much more succinct than the 
equivalent ones with disjunction, concatenation, and star operators. For the 
shuffle operation, Mayer and Stockmeyer M studied the computational com¬ 
plexity of membership and inequivalence problems. Inequivalence is exponential 
time complete, and membership is NP-complete for some classes of regular lan¬ 
guages. In particular, they showed that for regular expressions (REs) with shuffle, 
of size n, an equivalent nondeterministic finite automaton (NFA) needs at most 
2 n states, and presented a family of REs with shuffle, of size 0(n), for which 
the correspondent NFAs have at least 2 n states. Gelade [10], and Gruber and 
Holzer PTH] showed that there exists a double exponential trade-off in the 
translation from REs with shuffle to stantard REs. Gelade also gave a tight dou¬ 
ble exponential upper bound for the translation of REs with shuffle to DFAs. 
Recently, conversions of shuffle expressions to finite automata were presented 
by Estrade [7] and Kumar and Verrna m■ In the latter paper the authors give 
an algorithm for the construction of an £-free NFA based on a classic Glushkov 
construction, and the authors claim that the size of the resulting automaton is 
at most 2 m+1 , where m is the number of letters that occur in the RE with shuffle. 

In this paper we present a conversion of REs with shuffle to £-free NFAs, by 
generalizing the partial derivative construction for standard REs )lll5| . For stan¬ 
dard REs, the partial derivative automaton (Apd) is a quotient of the Glushkov 
automaton ( A pos ) and Broda et al. |2I81 showed that, asymptotically and on 
average, the size of A p d is half the size of A pos - In the case of REs with shuffle 
we show that the number of states of the partial derivative automaton is in the 
worst-case 2 m (with m as before) and an upper bound for the average size is 
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This paper is organized as follows. In the next section we review the shuffle 
operation and regular expressions with shuffle. In Section [ 3 ] we consider equation 
systems, for languages and expressions, associated to nondeterministic finite au¬ 
tomata and define a solution for a system of equations for a shuffle expression. 
An alternative and equivalent construction, denoted by A p d , is given in Section[ 4 ] 
using the notion of partial derivative. An upper bound for the average number of 
states of Apd using the framework of analytic combinatorics is given in Section [ 5 ] 
We conclude in Section [6] with some considerations about how to improve the 
presented upper bound and related future work. 

2 Regular Expressions with Shuffle 

Given an alphabet E, the shuffle of two words in E* is a finite set of words 
defined inductively as follows, for x, y £ E* and a,b £ E 

a.’LLl£ = £LU2; = {a;} 

ax LU by = { az \ z £ x LU by } U { bz \ z £ ax LU y }. 

This definition is extended to sets of words, i.e. languages, in the natural 
way: 

Li\±iL2 = {xiUy\xG Li, y £ L2 }. 

It is well known that if two languages Li,L2 C E* are regular then L e LU 
L2 is regular. One can extent regular expressions to include the LU operator. 
Given an alphabet E, we denote by Tm the set containing 0 plus all terms 
finitely generated from E U {e} and operators +, ■, LU, *, that is, the expressions 
r generated by the grammar 

r —> 0 | a (1) 

a —>£\a\a + a\ a- a\ alLia\a* (a £ E). ( 2 ) 

As usual, the (regular) language £(r) represented by an expression r £ Tyj 
is inductively defined as follows: £(0) = 0, C(e) = {e}, C(a) = {a} for a £ E, 
£(a*) = £(a)*, C(a + p) = C{a) U £(/3), £(a/3) = £(a)£(/3), and £(a LB p) = 
£(a) LU £(/ 3). We say that two expressions n, T 2 £ Tyj are equivalent, and write 
ri = t 2 , if £(ri) = £(t 2 ). 

Example 1 . Consider a n = a\ LU • • • LUa n , where n > 1 , 7^ aj for 1 < i 7^ j < n. 

Then, 

£(cLn) = { Oil ■ ■ • a *n I hi • • • j *n is a permutation of 1,..., n}. 

The shuffle operator LB is commutative, associative, and distributes over +. 
It also follows that for all a,b £ E and n,r 2 £ Tqj, 

ar1 LB br2 = o(ri LB 6T2) + b{ar\ LB r 2 ). 

Given a language L, we define e(r) = e(£(r)), where, e{L) = £ if e £ L 
and e(L) = 0 otherwise. A recursive definition of £ : T^ —> {0i £ } is given 
by the following: e(a) = e(0) = 0, e(e) = e{a*) = £, e(a + /?) = e(a) + e(/3), 
e(a/3) = e{a)e{p) 1 and e{a lb /3) = e(a)e(p). 
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3 Automata and Systems of Equations 

We first recall the definition of a N FA as a tuple A = (S, E, So,S, F), where S 
is a finite set of states, E is a finite alphabet, So C S a set of initial states, 
S : S x E —> 'P(S) the transition function, and F C S a set of final states. 
The extension of 6 to sets of states and words is defined by 6(X, s) = X and 
5{X,ax ) = <5(U se x<5(s, a), x). A word x G X* is accepted by A if and only 
if 5(So,x) fl F 7 ^ 0. The language of A is the set of words accepted by A and 
denoted by C(A). The right language of a state s, denoted by C s , is the language 
accepted by A if we take So = {s}. The class of languages accepted by all the 
NFAs is precisely the set of regular languages. 

It is well known that, for each n-state NFA A , over X = {ai, ...,a k }, having 
right languages C\, ..., C n , it is possible to associate a system of linear language 
equations 

Ci = aiCu U • • • U a k C ik U e(A), i G [1, n] 

where each Cij is a (possibly empty) union of elements in {£i,...,£„}, and 
£i=£(A). 

In the same way, it is possible to associate to each regular expression a system 
of equations on expressions. We here extend this notion to regular expressions 
with shuffle. 

Definition 2. Consider E = {ai,... ,a k } and op G Tyj. A support of ao is a 
set {ai,..., a n } that satisfies a system of equations 


ai = aiaii d-F a k a ki + e(ai), iG [0,n] (3) 

for some an, ..., a k i, each one a (possibly empty) sum of elements in {an,..., a n } 
In this case {ao,a\,... ,a n } is called a prebase of ao- 

It is clear from what was just said above, that the existence of a support of 
a implies the existence of an NFA that accepts the language determined by a. 

Note that the system of equations (0 can be written in matrix form A a = 
C • M a + E a , where M„ is the k x (n + 1) matrix with entries a^-, and A Q , C and 
E a denote respectively the following three matrices, 

A q = [op * * * , C = [or ■ * ■ o/c] , and E a — ^(oo) * * * £(o n )J , 

where, C- M a denotes the matrix obtained from C and M a applying the standard 
rules of matrix multiplication, but replacing the multiplication by the concate¬ 
nation. This notation will be use below. 

A support for an expression a G T m can be computed using the function 
7 r : Tuj —> V(T W ) recursively given by the following. 

Definition 3. Given TGT m , the set 7 r(r) is inductively defined by, 
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7r(0) = 7r(e) = 0 7 r(a + /?) = 7r(a) U 7r(/3) 

7 r(a) = {e} (a £ JC) 7r(a/3) = n(a)/3 U 7r(/3) 

n(a*) = 7r(a)a* 7r(a LU /3) = 7r(a) lli 7t(/3 ) 

U 7r(a) LLJ {/?} U {a} LU 7r(/3), 

where, given S,T C T m and /3 £ Tuj\ {0,e}, Sj3 = { a/3 | a £ S } and 
S , i_uT = {aLU/3|a£S', /3 £ T }, Se = {e} lu S = S lu {e} = S, and 
S0 = 0S = 0. 

The following lemma follows directly from the definitions and will be used in 
the proof of Proposition 0 

Lemma 4. If a, /? £ T m; then e(/3) • £(a) C £(a LLJ /3). 

Proposition 5. If a £ Tyj, then the set 7r(a) is a support of a. 

Proof. We proceed by induction on the structure of a. Excluding the case where 
a is ao LLJ /3o, the proof can be found in USE]. We now describe how to obtain 
a system of equations corresponding to an expression ao LLJ /3q from systems for 
ao and /?o- Suppose that 7r(ao) = {ai,... ,a n } is a support of ao and 7r(/3o) = 
{/3i,..., /3m} is a support of /3o- For ao and /3q consider C, A ao , M ao , E Qo and 
A^, M^ 0 , as above. We wish to show that 

7 r(a 0 lu /3 0 ) = {ai lu /3i,... ,ai lu /3 m ,... ,a„ lu/3i, ... ,a n lu /3 m } 

U {ai lu p 0 ,..., a n lu /3 0 } U {a 0 lu ft,..., a 0 lu /3 m } 

is a support of ao LU /3o- Let A QoLLIj g 0 be the (n + 1 )(m + l)-entry row-matrix 
whose entires are 

[a 0 lu /3 0 ai lu /3i • • • a„ lu /3 m ai lu /3 0 • • • a„ lu /3 0 a 0 lu /3i ■ ■ ■ a 0 lu /3 m \ . 

Then, E ctoUJj g 0 is defined as usual, i.e. containing the values of e(a) for all entries 
a in Aqquj^q. 

Finally, let M aoUJi g 0 be the k x (n + l)(m + 1) matrix whose entries 
for l £ [l,fc] and ( i,j) £ [0,n] x [0,m], are defined by 

7l,(i,j) = a;iLU/3j + aiLU fa. 

Note that, since by the induction hypothesis each an is a sum of elements 
in 7r(a) and each /3ij is a sum of elements in 7 t(/3 ), after applying distributivity 
of LU over + each element of M aoLU| g 0 is in fact a sum of elements in 7r(ao LU /3o). 
We will show that A QoLU/ 3 0 = C • M aoUJ/ g 0 + E aoLU ^ 0 . For this, consider a^ LU /3j 
for some (i, j) £ [0, n] x [0 ,to]. We have a; = aiau + • • • + akaki + e(aj) and 
/3j = aiPij + • • • + akPkj + Consequently, using properties of LU, namely 

distributivity over +, as well as LemmaS] 




Partial Derivative Automaton for Regular Expressions with Shuffle 


5 


Oti LU (3j — (oittii + • • • + CLkOtki + s{o!i)) LU (ai/3i j + • • • + dkPkj + £ (3j)) 
= a± (an LU /3 j + ati LU 3ij + s(3j) a u + e(oii)3ij) + • • • + 

Uk (aki L-L1 3j “F Ctj LLI 3kj “F £ (3j') £ ^ki “F £ ( £ *-i) 3 kj ) “F £ (a% 1—1—1 3j ) 

= di (ctii LLI 3j + Cti LLI 3lj) + ' ' ' + 

a k ( a ki lu 3j + UJ 3kj) + e(«i lu 3j) 

= a l7l,(*,j) + • • • + 0>klk,(i,j) + £ ( a i LU 3j)- 


□ 

It is clear from its definition that tv (a) is finite. In the following proposition, 
an upper bound for the size of n(a) is given. Example 0 is a witness that this 
upper bound is tight. 

Proposition 6. Given a £ Tyj, one has |7t(ck)| < 2^ a ^ E — 1, where \a\s denotes 
the number of alphabet symbols in a. 

Proof. The proof proceeds by induction on the structure of a. It is clear that 
the result holds for a = 0, a = £ and for a = a £ £. Now, suppose the claim 
is true for a and 3- There are four induction cases to consider. We will make 
use of the fact that, for m , n > 0 one has 2 m + 2 n — 2 < 2 m+n — 1. For a *, one 
has |7r(a*)| = \n(a)a*\ = |7r(a)| < 2^ s — 1 = 2l“*l E — 1. For a + 3, one has 
\n(a + 3)\ = |7r(a)U7r(/3)| < 2^ s - 1 + 2^* - 1 < 2l Q l E+ l^ E - 1 = 2l Q+ ^ E - 1. 
For a/3, one has |7r(a/3)| = \tt(o)3 U 7t(/ 3)| < 2l“l E — 1 + 2^^ E — 1 < 2l“ /3 l E — 1. 
Finally, for aLU/3, one has |7r(aLU/3)| = |7r(a)LU7r(/3)U7r(a)Lu{/3}U{a}l_U7r(/3)| < 

(2} a \z — l)(2l /3 l E — 1) + 2l“l E — 1 + 2l /3 l E — 1 = 2l“l r+ l /3 l s — 1 = 2l auJ ^- E — 1. □ 

Example 7. Considering again a n = a\ LU • • • LU a n , where n > 1, at ^ aj for 
1 < * 7 ^ j < n, one has 

|7r(a„)| = |{ LUa* | I C {1,.. .,n} }| = 2 n - 1, 
ie/ 

where we consider LU Oj = 1. 

iS0 

The proof of Proposition [5] gives a way to construct a system of equations 
for an expression r £ Tyj, corresponding to an NFA that accepts the language 
represented by r. This is done by recursively computing 7r(r) and the matrices 
A r and E T , obtaining the whole NFA in the final step. 

In the next section we will show how to build the same N FA in a more efficient 
way using the notion of partial derivative. 

4 Partial Derivatives 

Recall that the left-quotient of a language L w.r.t. a symbol a £ £ is 

a~ l L = { x \ ax £ L }. 
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The left quotient of L w.r.t. a word x £ E* is then inductively defined by 
e~ 1 L = L and ( xa)~ 1 L = a~ 1 (x~ 1 L). Note that for Li,L ,2 C E* and a,b £ E 
the shuffle operation satisfies a~ l {L\ LLI L 2 ) = (a^ 1 Li) LLI L 2 U L\ ill (a^ 1 !^). 

Definition 8 . The set of partial derivatives of a term r £ Tyj w.r.t. a letter 
a £ E, denoted by d a {r), is inductively defined by 


d a (a*) = d a (a)a* 
d a (a + P) = d a (a) U d a (/3) 
d a (a/3) = d a (a)/3 U e(a)d a (/3 ) 
d a (a LJJ ft) = d Q (a) LJJ {/?} U {a} LU 9 a (/3). 

The set of partial derivatives of r £ T m w.r.t. a word x G E* is inductively 
defined by c£(t) = {r} and d xa (r) = d a {d x [r)), where, given a set S C Tyj, 
da(S)=U reS d a (r). 

We denote by <9(r) the set of all partial derivatives of an expression t, 
i-e- d(r) = \J xeE . d x (r), and by <9 + (r) the set of partial derivatives exclud¬ 
ing the trivial derivative by e, i.e. <9 + (r) = Uaei:+ 9 x (t). Given a set S C T m , 
we define £(5) = The following result has a straightforward proof. 

Proposition 9. Given x G E* and r G T m , one has C(d x {r)) = x _ 1 £(r). 

The following properties of <9 + (t) will be used in the proof of Proposition ITT1 

Lemma 10. For r G Tm, the following hold. 

1. If d + {r) ^ 0, then there is ao G <9 + (t) with e(ao) = £■ 

2. If <9 + (t) = 0 and r 7 ^ 0, then £(r) = {e} and e(r) = e. 

Proof. 1. From the grammar rule @ it follows that 0 cannot appear as a subex¬ 
pression of a larger term. Suppose that there is some 7 G c> + (t). We con¬ 
clude, from Definition [ 8 ] and from the previous remark, that there is some 
word x G E + such that x G £( 7 ). This is equivalent to e G C(d x ( 7 )), which 
means that there is some ao £ <9x(7) Q <9 + (r) such that e(ao) = £. 

2. <9 + (t) = 0 implies that d x {r) = 0 for all x G E + . Thus, C(d x (r)) = { y \ 
ij G £(r) } = 0, and consequently there is no word z £ E + in £(t). On 
the other hand, since 0 does not appear in r, it follows that £(r) ^ 0. Thus, 

£(t) = {4- D 

Proposition 11. 9 + satisfies the following, 

d + (a + /?) = d + (a) U <9+(/3) 
d + (a/3 ) = d + {a)/3 U <9 + (/3) 
d + (a lb /3) = d + (a ) lb <9 + (/3) 

U <9 + (a) LB {/?} U {a} LB d + (/3). 


d a ({/» = d a (e)=9 


d+(0) = 3+(e) = 0 
9 + (a) = {e} (a G E) 
d + (a*) = d + (a)a* 
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Proof. The proof proceeds by induction on the structure of a. It is clear that 
«9+(0) = 0 , <9+(e) = 0 and, for a £ 2, d+(a) = {e}. 

In the remaining cases, to prove that an inclusion < 9 + ( 7 ) C E holds for some 
expression E, we show by induction on the length of x that for every x £ E + one 
has d x ( 7 ) C E. We will therefore just indicate the corresponding computations 
for £> 0 ( 7 ) and d xa { 7 ), for a £ E. We also make use of the fact that, for any 
expression 7 and letter a £ E, the set < 9 + ( 7 ) is closed for taking derivatives 
w.r.t. a, i.e., d a (d+{j)) C 5+( 7 ). 

Now, suppose the claim is true for a and 0 There are four induction cases 
to consider. 

— For a + 0 we have d a (a + /3) = d a {a) + d a {0) C d + (a) U 9 + (/3), as well as 
d xa {u + P) = d a (d x (a+/3)) C d a (d + (a)Ud + {/3)) C d a (d+ (a))U<9 0 (<9+ (/3)) C 
9 + (a) U d + {/3). Similarly, one proves that d x (a) £ d + (a + 0) and d x {j3) £ 
d + {a + 0), for all x £ E + . 

— For a*, we have d a {a*) = d a (u)a* C d + (a)a*, as well as 

d xa (u*) = d a (d x (a*)) C d a (d + (a)a*) C d a (d + {a))a* U d a (a*) 

C d + (a)a* U d a (a)a* C d + (a)a*. 

Furthermore, d a {a)a * = d a (u*) C 9+(a*) and d xa (a)a* = d a (d x (a))a* C 
d a (d x (a)a*) C d a (d + (a*)) C d + {a*). 

— For a/3, we have d a (a/3) = d a (a)/3 U e{a)d a {0) Q d + (a)/3 U d + (/3 ) and 

d xa (a/3) = d a {d x {af3 )) C d a (d + (a)/3 U d + (/3)) = d a (d + (a)/3) U d a (d + (/3)) 

C d a (d+(a))(3 U d a (0) U d a (d+m C d+(a)/3 U d+(/3). 

Also, d a (a)/3 C d a (a/3) C <9+(a/3) and 

9xa(«)/3 = d a (d x (a))/3 C d a (d x {a)f3) C <9 a (<9 + (a/3)) C <9 + (a/3). 

Finally, if e(a) = e, then <9 a (/3) C d a (a/3) and 9 a , Q (/3) = d a {d x {0)) C 
9a(9x(a/?)) = d xa (a/3). We conclude that 9 x (/3) C d x (a0) for all a; € A + , 
and therefore 9 + (/3) C <9 + (a/3). Otherwise, e(a) = 0, and it follows from 
Lemma m that 9 + (a) 7 ^ 0 , and that there is some ao £ d + (a) with 
e(ao) = 0. As above, this implies that d x (/3) C d x (ao/3) for all x £ E + . 
On the other hand, have already shown that d + (a)/3 C <9+(a/3). In par¬ 
ticular, ao/3 £ <9 + (a/3). From these two facts, we conclude that d x (j3) C 
d x (ao/3 ) C d x {d + {a{3)) C <9 + (a/3), which finishes the proof for the case of 
concatenation. 

— For a LU 0 we have 

d a (a LU 0) = d a (a) lu {/3} U {a} lu d a {0) 

C <9 + (a) lu <9 + (/3) U <9 + (a) lu {/?} U {a} lu <9 + (/3) 
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and 

d xa {a lu /3) C d a (d + (a ) lu d + (/3) U <9 + (a) lu {/3} U {a} i_u d + {/3)) 

= d a {d + (a) lu <9 + (/3)) U d a {d + (a) lu {/?}) U <9 a ({a} lu d + {/3)) 

= d a (d+(a)) uu 5+(/3) U d+(a) LU d a (d+(0)) U d a (d + (a)) lu {/?} 
U <9 + (a) lu d a {/3) U d a (a) lu d + (0) U {a} lu d a (d + ((3)) 

C 9 + (a) lu d + {/3) U d + (a) lu {/3} U {a} lu d + (/3). 

Now we prove that for all x £ S + , one has d x (a) LU {/3} C d x (a l_U/3), which 
implies (9 + (a) LU {/3} C d + (a LU/3). In fact, we have d a (a) LU {/3} C 9 a (aLU/3) 
and 


d xa {a) lu {/?} C d a {d x (a)) lu {/3} 

C d a {d x {a) lu {/?}) C d a (d x (a lu 0)) = d xa {a lu 0). 

Showing that {a} LU d x (/3) C d x (a LU /3) is analogous. Finally, for x,y £ S + 
we have d x {a) lu d y {0) C d y (d x (a) lu {fi}) C d y (d x (a lu 0)) = d xy (a lu 0) C 
9 + (aLU/3). □ 

Corollary 12. Given a £ T m , one has d + (a) = w(a). 

We conclude that d(a) corresponds to the set {a} U 7r(a), as is the case for 
standard regular expressions. It is well known that the set of partial derivatives 
of a regular expression gives rise to an equivalent NFA, called the Antimirov au¬ 
tomaton or partial derivative automaton, that accepts the language determined 
by that expression. This remains valid in our extension of the partial derivatives 
to regular expressions with shuffle. 

Definition 13. Given r £ T m , we define the partial derivative automaton as¬ 
sociated to t by 

Apdir) = (<9(t), E, {t}, 5 t , F t ), 
where F T = { 7 £ d(r) | e(y) = e } and S T ( 7 , a) = d a { 7 ). 

It is easy to see that the following holds. 

Proposition 14. For every state 7 £ d{r), the right language £ 7 of 7 in A{t) is 
equal to £( 7 ), the language represented by 7 . In particular, the language accepted 
by Apdir) is exactly C(t). 

Note that for the REs a n considered in examples [T] and [T] A p d{a n ) has 2 n 
states which is exactly the bound presented by Mayer and Stockmeyer. 

5 Average State Complexity of the Partial Derivative 
Automaton 

In this section, we estimate the asymptotic average size of the number of states in 
partial derivative automata. This is done by the use of the standard methods of 


Partial Derivative Automaton for Regular Expressions with Shuffle 


9 


analytic combinatorics as expounded by Flajolet and Sedgewick [9], which apply 
to generating functions A(z) = Y^ n a nZ n associated to combinatorial classes. 
Given some measure of the objects of a class A, the coefficient a n represents the 
sum of the values of this measure for all objects of size n. We will use the notation 
[. z n ]A{z ) for a n . For an introduction of this approach applied to formal languages, 
we refer to Broda et al. [4]. In order to apply this method, it is necessary to have 
an unambiguous description of the objects of the combinatorial class, as is the 
case for the specification of Tyj-expressions without 0 in ©• For the length 
or size of an Tyj-expression a we will consider the number of symbols in a , 
not counting parentheses. Taking k = |Xj, we compute from © the generating 
functions R k (z) and L k (z), for the number of Tyj-expressions without 0 and 
the number of alphabet symbols in Tyj-expressions without 0, respectively. Note 
that excluding one object, 0, of size 1 has no influence on the asymptotic study. 

According to the specification in © the generating function R k (z) for the 
number of Tyj-expressions without 0 satisfies 

Rk(z) = z + kz + 3 zRk{z) 2 + zRk(z), 


thus, 


Rk{z ) = --- k ^ \ where A k (z) = 1 — 2z — (11 + 12k)z 2 . 

The radius of convergence of Rk{z) is pu = ~ 1 n 2 fc ^• Now, note that the 
number of letters 1(a) in an expression a satisfies: 1(e) = 0, in 1(a) = 1, for 
a £ £, l(a +13) = 1(a) +1(/3), etc. From this, we conclude that the generating 
function Lk(z) satisfies 

L k (z) = kz + 3 zL k (z)R k (z) + zL k (z), 


thus, 


L , ) = ( ~ kz ) = kz 

fc “ 6zR k (z) + z - 1 i/A k (z)' 

Now, let P k (z) denote the generating function for the size of 7r(a) for Tyj- 
expressions without 0. From Definition [3] it follows that, given an expression 
a , an upper bound, p(a), for the number of elementtQ in the set n(a) satisfies: 

p(e) = 0 p(a + S3) = p(a) + p(/3) 

p(a) = 1, for a £ £ p(a/3) = p(ot) + p(f3) 

p(a ) = p(a) p(a lu (3) = p(a)p(j3) + p(a) + p(/3). 

From this, we conclude that the generating function P k (z) satisfies 

Pk(z) = kz + 6 zP k (z)Rk(z) + zP k (z) + zP k (z) 2 , 


1 This upper bound corresponds to the case where all unions in 7r(a) are disjoint. 
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thus 


Pk{z) = Qk{z) + S k {z), 


where 


Qk(z') — 


V^k(z) 


S k {z) = - 




and A' k (z) = 1 — 2z— (11 + 16 k)z 2 . The radii of convergence of Qk{z) and Sk{z) 
are respectively pk (defined above) and p' k = ~ 1+2 ^ /3 + — 


ll+16fc 


5.1 Asymptotic analysis 

A generating function / can be seen as a complex analytic function, and the 
study of its behaviour around its dominant singularity p (in case there is only 
one, as it happens with the functions here considered) gives us access to the 
asymptotic form of its coefficients. In particular, if f(z) is analytic in some 
appropriate neighbourhood of p, then one has the following mm-- 

1. if f(z) = a — bsj 1 — z/p + o ^ \Jl — z/p^J , with a, b € R, b ^ 0, then 

b 


[z n ]f(z) 


2- if f(z) = 


, with a € R, and a ^ 0, then 


s/P-PTp 

[*"]/(*) 

Hence, by [TJ one has for the number of T m -expressions of size n, 


4=P~ n n" 1/2 . 

V7T 


r n]n , , (3 + 3 /c) 4 —n— ,,_3 

[z n }R k {z) = p k (n +1) ^ 

and by [2] for the number of alphabet symbols in all expression of size n , 

k 


[z n ]L k (z) = 


2^(3+ 3 k)z 


—n+h —i 
TPk n 2 - 


(4) 


(5) 


Consequently, the average number of letters in an expression of size n, which we 
denote by avL , is asymptotically given by 


avL = 


[z n ]L k (z) _ 3 kp k (n+ l)a 
[z n ]R k {z) ~ V3T3k ni 
Finally, by [l] , one has for the size of expressions of size n, 

[z n }P k (z) = [z n }Qk(z) + [z n ]Sk(z) 

(3 + 3 k)z Pk n ~ h + (3 + 4 k)i(f/ k )-"-l 


2 ypK 


<n+ 1)- + 
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and the average size of n(a) for an expression a of size ?r, denoted by avP , is 
asymptotically given by 


[z n ]P k (z 
[.z n ]R k (z ) 


Taking into account Proposition [Gl we want to compare the values of log 2 avP 
and avL. In fact, one has 


lim 

n,k—*oo 


log 2 avP 
avL 



0.415. 


This means that, 

lim avP 1/avL = |. 
n,k—> oo 3 

Therefore, one has the following significant improvement, when compared 
with the worst case, for the average case upper bound. 

Proposition 15. For large values of k and n an upper bound for the average 
number of states of A p d is (|)l a l E . 

6 Conclusion and Future Work 

We implemented in the FAdo system [8j the construction of the A p d for REs with 
shuffle and performed some experimental tests for small values of n and k. Those 
experiments over statistically significant samples of uniform random generated 
REs suggest that the upper bound obtained in the last section falls way short of 
its true value. This is not surprising as in the construction of 7r(a)U{a} repeated 
elements can occur. 

In previous work j2], we identified classes of standard REs that capture a 
significant reduction on the size of n(a). In the case of REs with shuffle, those 
classes enforce only a marginal reduction in the number of states, but a dras¬ 
tic increase in the complexity of the associated generating function. Thus the 
expected gains don’t seem to justify its quite difficult asymptotic study. 

Sulzmann and Thiemann m extended the notion of Brzozowski derivative 
for several variants of the shuffle operator. It will be interesting to carry out a 
descriptional complexity study of those constructions and to see if it is interesting 
to extend the notion of partial derivative to those shuffle variants. 

An extension of the partial derivative construction for extended REs with 
intersection and negation was recently presented by Caron et. al [5] . It will be 
also interesting to study the average complexity of this construction. 
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